78 research outputs found
Metascheduling of HPC Jobs in Day-Ahead Electricity Markets
High performance grid computing is a key enabler of large scale collaborative
computational science. With the promise of exascale computing, high performance
grid systems are expected to incur electricity bills that grow super-linearly
over time. In order to achieve cost effectiveness in these systems, it is
essential for the scheduling algorithms to exploit electricity price
variations, both in space and time, that are prevalent in the dynamic
electricity price markets. In this paper, we present a metascheduling algorithm
to optimize the placement of jobs in a compute grid which consumes electricity
from the day-ahead wholesale market. We formulate the scheduling problem as a
Minimum Cost Maximum Flow problem and leverage queue waiting time and
electricity price predictions to accurately estimate the cost of job execution
at a system. Using trace based simulation with real and synthetic workload
traces, and real electricity price data sets, we demonstrate our approach on
two currently operational grids, XSEDE and NorduGrid. Our experimental setup
collectively constitute more than 433K processors spread across 58 compute
systems in 17 geographically distributed locations. Experiments show that our
approach simultaneously optimizes the total electricity cost and the average
response time of the grid, without being unfair to users of the local batch
systems.Comment: Appears in IEEE Transactions on Parallel and Distributed System
A Preemption-Based Meta-Scheduling System for Distributed Computing
This research aims at designing and building a scheduling framework for distributed computing systems with the primary objectives of providing fast response times to the users, delivering high system throughput and accommodating maximum number of applications into the systems. The author claims that the above mentioned objectives are the most important objectives for scheduling in recent distributed computing systems, especially Grid computing environments.
In order to achieve the objectives of the scheduling framework, the scheduler employs arbitration of application-level schedules and preemption of executing jobs under certain conditions. In application-level scheduling, the user develops a schedule for his application using an execution model that simulates the execution behavior of the application. Since application-level scheduling can seriously impede the performance of the system, the scheduling framework developed in this research arbitrates between different application-level schedules corresponding to different applications to provide fair system usage for all applications and balance the interests of different applications. In this sense, the scheduling framework is not a classical scheduling system, but a meta-scheduling system that interacts with the application-level schedulers.
Due to the large system dynamics involved in Grid computing systems, the ability to preempt executing jobs becomes a necessity. The meta-scheduler described in this dissertation employs well defined scheduling policies to preempt and migrate executing applications. In order to provide the users with the capability to make their applications preemptible, a user-level check-pointing library called SRS (Stop-Restart Software) was also developed by this research. The SRS library is different from many user-level check-pointing libraries since it allows reconfiguration of applications between migrations. This reconfiguration can be achieved by changing the processor configuration and/or data distribution.
The experimental results provided in this dissertation demonstrates the utility of the metascheduling framework for distributed computing systems. And lastly, the metascheduling framework was put to practical use by building a Grid computing system called GradSolve. GradSolve is a flexible system and it allows the application library writers to upload applications with different capabilities into the system. GradSolve is also unique with respect to maintaining traces of the execution of the applications and using the traces for subsequent executions of the application
PartRePer-MPI: Combining Fault Tolerance and Performance for MPI Applications
As we have entered Exascale computing, the faults in high-performance systems
are expected to increase considerably. To compensate for a higher failure rate,
the standard checkpoint/restart technique would need to create checkpoints at a
much higher frequency resulting in an excessive amount of overhead which would
not be sustainable for many scientific applications. Replication allows for
fast recovery from failures by simply dropping the failed processes and using
their replicas to continue the regular operation of the application.
In this paper, we have implemented PartRePer-MPI, a novel fault-tolerant MPI
library that adopts partial replication of some of the launched MPI processes
in order to provide resilience from failures. The novelty of our work is that
it combines both fault tolerance, due to the use of the User Level Failure
Mitigation (ULFM) framework in the Open MPI library, and high performance, due
to the use of communication protocols in the native MPI library that is
generally fine-tuned for specific HPC platforms. We have implemented efficient
and parallel communication strategies with computational and replica processes,
and our library can seamlessly provide fault tolerance support to an existing
MPI application. Our experiments using seven NAS Parallel Benchmarks and two
scientific applications show that the failure-free overheads in PartRePer-MPI
when compared to the baseline MVAPICH2, are only up to 6.4% for the NAS
parallel benchmarks and up to 9.7% for the scientific applications
Holistic Slowdown Driven Scheduling and Resource Management for Malleable Jobs
In job scheduling, the concept of malleability has been explored since many
years ago. Research shows that malleability improves system performance, but
its utilization in HPC never became widespread. The causes are the difficulty
in developing malleable applications, and the lack of support and integration
of the different layers of the HPC software stack. However, in the last years,
malleability in job scheduling is becoming more critical because of the
increasing complexity of hardware and workloads. In this context, using nodes
in an exclusive mode is not always the most efficient solution as in
traditional HPC jobs, where applications were highly tuned for static
allocations, but offering zero flexibility to dynamic executions. This paper
proposes a new holistic, dynamic job scheduling policy, Slowdown Driven
(SD-Policy), which exploits the malleability of applications as the key
technology to reduce the average slowdown and response time of jobs. SD-Policy
is based on backfill and node sharing. It applies malleability to running jobs
to make room for jobs that will run with a reduced set of resources, only when
the estimated slowdown improves over the static approach. We implemented
SD-Policy in SLURM and evaluated it in a real production environment, and with
a simulator using workloads of up to 198K jobs. Results show better resource
utilization with the reduction of makespan, response time, slowdown, and energy
consumption, up to respectively 7%, 50%, 70%, and 6%, for the evaluated
workloads
- …